A Robust Framework for Web Information Extraction and Retrieval

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data Extraction and Visualization Framework for Information Retrieval Systems

In recent years we are witnessing a continuous growth in the amount of data that both public and private organizations collect and profit by. Search engines are the most common tools used to retrieve information, and more recently, clustering techniques showed to be an effective tool in helping users to skim query results. The majority of the systems proposed to manage information, provide text...

متن کامل

A Framework for Automatic Document Understanding for Web Information Retrieval

Most of the web search engines use keyword based approach to search for needed information on the web. When a query is submitted by the user to the search engine, the web crawler tries to match the keywords with name of file, URL or the meta tags of the documents. Because of this, user may get many non-relevant documents along with relevant documents. It can lead to the frustration of informati...

متن کامل

PolyUHK: A Robust Information Extraction System for Web Personal Names

Personal information extraction is an important component of advanced information retrieval. There are two problems needed to be solved in this practical task: personal name ambiguity and extraction of personal information for a specific person. For personal name ambiguity, which is a very common phenomenon in the fast growing Web resource, we propose a robust system which extracts features wit...

متن کامل

OCR++: A Robust Framework For Information Extraction from Scholarly Articles

This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written ...

متن کامل

A Framework for Decentralized Ranking in Web Information Retrieval

Search engines are among the most important applications or services on the web. Most existing successful search engines use global ranking algorithms to generate the ranking of documents crawled in their databases. However, global ranking of documents has two potential problems: high computation cost and potentially poor rankings. Both of the problems are related to the centralized computation...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Machine Learning and Computing

سال: 2014

ISSN: 2010-3700

DOI: 10.7763/ijmlc.2014.v4.403